如最近的研究所示,支持机器智能的系统容易受到对抗性操纵或自然分配变化产生的测试案例的影响。这引起了人们对现实应用程序部署机器学习算法的极大关注,尤其是在自动驾驶(AD)等安全性领域中。另一方面,由于自然主义场景的传统广告测试需要数亿英里,这是由于现实世界中安全关键方案的高度和稀有性。结果,已经探索了几种自动驾驶评估方法,但是,这些方法通常是基于不同的仿真平台,安全性 - 关键的情况的类型,场景生成算法和驾驶路线变化的方法。因此,尽管在自动驾驶测试方面进行了大量努力,但在相似条件下,比较和了解不同测试场景产生算法和测试机制的有效性和效率仍然是一项挑战。在本文中,我们旨在提供第一个统一的平台Safebench,以整合不同类型的安全性测试方案,场景生成算法以及其他变体,例如驾驶路线和环境。同时,我们实施了4种基于深入学习的AD算法,具有4种类型的输入(例如,鸟类视图,相机,相机),以对SafeBench进行公平的比较。我们发现,我们的生成的测试场景确实更具挑战性,并观察到在良性和关键安全测试方案下的广告代理的性能之间的权衡。我们认为,我们的统一平台安全基地用于大规模和有效的自动驾驶测试,将激发新的测试场景生成和安全AD算法的开发。 SafeBench可从https://safebench.github.io获得。
translated by 谷歌翻译
人工智能的一种令人信服的应用是生成一个目标人执行任意所需运动的视频(来自来源的人)。虽然最新的方法能够合成一个视频,展示了类似的宽带运动细节,但它们通常缺乏纹理细节。相关的表现出现为扭曲的脸,脚和手,这种缺陷是人类观察者对人的非常敏感的。此外,当前的方法通常采用L2损失的GAN来评估生成的视频的真实性,固有地需要大量的培训样品来学习纹理细节以进行足够的视频生成。在这项工作中,我们从三个方面应对这些挑战:1)我们将每个视频框架分解为前景(人)和背景,重点是生成前景,以减少网络输出的基本维度。 2)我们提出了一种理论上动机的Gromov-Wasserstein损失,可促进从姿势到前景图像学习地图。 3)为了增强纹理细节,我们用几何指导编码面部特征,并使用当地甘斯来完善面部,脚和手。广泛的实验表明,我们的方法能够生成现实的目标人视频,忠实地从源人员那里复制复杂的动作。我们的代码和数据集在https://github.com/sifann/fakemotion上发布
translated by 谷歌翻译
最近的研究表明,预训练的语言模型(LMS)容易受到文本对抗性攻击的影响。但是,现有的攻击方法要么遭受低攻击成功率,要么无法在指数级的扰动空间中有效搜索。我们提出了一个有效有效的框架Semattack,以通过构建不同的语义扰动函数来生成自然的对抗文本。特别是,Semattack优化了对通用语义空间约束的生成的扰动,包括错字空间,知识空间(例如WordNet),上下文化的语义空间(例如,BERT群集的嵌入空间)或这些空间的组合。因此,生成的对抗文本在语义上更接近原始输入。广泛的实验表明,最新的(SOTA)大规模LMS(例如Deberta-V2)和国防策略(例如Freelb)仍然容易受到Semattack的影响。我们进一步证明,Semattack是一般的,并且能够为具有较高攻击成功率的不同语言(例如英语和中文)生成自然的对抗文本。人类评估还证实,我们产生的对抗文本是自然的,几乎不会影响人类的表现。我们的代码可在https://github.com/ai-secure/semattack上公开获取。
translated by 谷歌翻译
大规模的预训练语言模型在广泛的自然语言理解(NLU)任务中取得了巨大的成功,甚至超过人类性能。然而,最近的研究表明,这些模型的稳健性可能受到精心制作的文本对抗例子的挑战。虽然已经提出了几个单独的数据集来评估模型稳健性,但仍缺少原则和全面的基准。在本文中,我们呈现对抗性胶水(AdvGlue),这是一个新的多任务基准,以定量和彻底探索和评估各种对抗攻击下现代大规模语言模型的脆弱性。特别是,我们系统地应用14种文本对抗的攻击方法来构建一个粘合的援助,这是由人类进一步验证的可靠注释。我们的调查结果总结如下。 (i)大多数现有的对抗性攻击算法容易发生无效或暧昧的对手示例,其中大约90%的含量改变原始语义含义或误导性的人的注册人。因此,我们执行仔细的过滤过程来策划高质量的基准。 (ii)我们测试的所有语言模型和强大的培训方法在AdvGlue上表现不佳,差价远远落后于良性准确性。我们希望我们的工作能够激励开发新的对抗攻击,这些攻击更加隐身,更加统一,以及针对复杂的对抗性攻击的新强大语言模型。 Advglue在https://adversarialglue.github.io提供。
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Face Anti-spoofing (FAS) is essential to secure face recognition systems from various physical attacks. However, recent research generally focuses on short-distance applications (i.e., phone unlocking) while lacking consideration of long-distance scenes (i.e., surveillance security checks). In order to promote relevant research and fill this gap in the community, we collect a large-scale Surveillance High-Fidelity Mask (SuHiFiMask) dataset captured under 40 surveillance scenes, which has 101 subjects from different age groups with 232 3D attacks (high-fidelity masks), 200 2D attacks (posters, portraits, and screens), and 2 adversarial attacks. In this scene, low image resolution and noise interference are new challenges faced in surveillance FAS. Together with the SuHiFiMask dataset, we propose a Contrastive Quality-Invariance Learning (CQIL) network to alleviate the performance degradation caused by image quality from three aspects: (1) An Image Quality Variable module (IQV) is introduced to recover image information associated with discrimination by combining the super-resolution network. (2) Using generated sample pairs to simulate quality variance distributions to help contrastive learning strategies obtain robust feature representation under quality variation. (3) A Separate Quality Network (SQN) is designed to learn discriminative features independent of image quality. Finally, a large number of experiments verify the quality of the SuHiFiMask dataset and the superiority of the proposed CQIL.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Panoptic Part Segmentation (PPS) unifies panoptic segmentation and part segmentation into one task. Previous works utilize separated approaches to handle thing, stuff, and part predictions without shared computation and task association. We aim to unify these tasks at the architectural level, designing the first end-to-end unified framework named Panoptic-PartFormer. Moreover, we find the previous metric PartPQ biases to PQ. To handle both issues, we make the following contributions: Firstly, we design a meta-architecture that decouples part feature and things/stuff feature, respectively. We model things, stuff, and parts as object queries and directly learn to optimize all three forms of prediction as a unified mask prediction and classification problem. We term our model as Panoptic-PartFormer. Secondly, we propose a new metric Part-Whole Quality (PWQ) to better measure such task from both pixel-region and part-whole perspectives. It can also decouple the error for part segmentation and panoptic segmentation. Thirdly, inspired by Mask2Former, based on our meta-architecture, we propose Panoptic-PartFormer++ and design a new part-whole cross attention scheme to further boost part segmentation qualities. We design a new part-whole interaction method using masked cross attention. Finally, the extensive ablation studies and analysis demonstrate the effectiveness of both Panoptic-PartFormer and Panoptic-PartFormer++. Compared with previous Panoptic-PartFormer, our Panoptic-PartFormer++ achieves 2% PartPQ and 3% PWQ improvements on the Cityscapes PPS dataset and 5% PartPQ on the Pascal Context PPS dataset. On both datasets, Panoptic-PartFormer++ achieves new state-of-the-art results with a significant cost drop of 70% on GFlops and 50% on parameters. Our models can serve as a strong baseline and aid future research in PPS. Code will be available.
translated by 谷歌翻译